[GitHub] [incubator-iceberg] waterlx commented on issue #457: DataFrame generated by Seq() might have schema conflict with Iceberg

2019-09-24 Thread GitBox
waterlx commented on issue #457: DataFrame generated by Seq() might have schema 
conflict with Iceberg
URL: 
https://github.com/apache/incubator-iceberg/issues/457#issuecomment-534483177
 
 
   @sujithjay , thanks for taking care of this!
   I see your point by changing the Iceberg schema, but what if I do not want 
to make that field as "optional", and keep it as "required"?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] waterlx removed a comment on issue #492: Should issue an error/warning message when no data file to delete

2019-09-24 Thread GitBox
waterlx removed a comment on issue #492: Should issue an error/warning message 
when no data file to delete
URL: 
https://github.com/apache/incubator-iceberg/issues/492#issuecomment-534425688
 
 
   I used the following API, trying to delete a data file
   ```
   table.newDelete().deleteFile("file:///path1")
   ```
   And got the following message: INFO SnapshotProducer: `Committed snapshot 
x (StreamingDelete)`
   I thought the deletion succeeded, but actually it did not, because the data 
file is "file:/path1", not "file:///path1".
   While [file URI scheme](https://en.wikipedia.org/wiki/File_URI_scheme) is 
another topic, but it would be better to issue an error/warning here when files 
to delete are not in data files.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] waterlx commented on issue #492: Should issue an error message when no data file to delete

2019-09-24 Thread GitBox
waterlx commented on issue #492: Should issue an error message when no data 
file to delete
URL: 
https://github.com/apache/incubator-iceberg/issues/492#issuecomment-534425688
 
 
   I used the following API, trying to delete a data file
   ```
   table.newDelete().deleteFile("file:///path1")
   ```
   And got the following message: INFO SnapshotProducer: `Committed snapshot 
x (StreamingDelete)`
   I thought the deletion succeeded, but actually it did not, because the data 
file is "file:/path1", not "file:///path1".
   While [file URI scheme](https://en.wikipedia.org/wiki/File_URI_scheme) is 
another topic, but it would be better to issue an error/warning here when files 
to delete are not in data files.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] waterlx opened a new issue #493: concurrent delete

2019-09-24 Thread GitBox
waterlx opened a new issue #493: concurrent delete
URL: https://github.com/apache/incubator-iceberg/issues/493
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] waterlx commented on issue #493: Concurrently delete a data file

2019-09-24 Thread GitBox
waterlx commented on issue #493: Concurrently delete a data file
URL: 
https://github.com/apache/incubator-iceberg/issues/493#issuecomment-534543516
 
 
   @rdblue Could you please help to review my findings at your most convenience?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] waterlx commented on issue #492: Should issue an error/warning message when no data file to delete

2019-09-24 Thread GitBox
waterlx commented on issue #492: Should issue an error/warning message when no 
data file to delete
URL: 
https://github.com/apache/incubator-iceberg/issues/492#issuecomment-534545600
 
 
   @rdblue make sense to you?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdblue commented on issue #493: Concurrently delete a data file

2019-09-24 Thread GitBox
rdblue commented on issue #493: Concurrently delete a data file
URL: 
https://github.com/apache/incubator-iceberg/issues/493#issuecomment-534648500
 
 
   This doesn't fail because deletes are idempotent. You tell Iceberg to delete 
with a filter or to delete a specific file, and that data will not be in the 
table after the delete commits.
   
   That's also why you see multiple delete operations. You ran 2 successful 
deletes on the table.
   
   The two metadata files you posted are two attempts to delete at the same 
time. That's why they use the same parent snapshot ID. The metastore must 
guarantee an atomic swap that is used to get a linear history of changes. So 
only one of these deletes will have been successfully committed. The other 
would have had to retry.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdblue commented on issue #492: Should issue an error/warning message when no data file to delete

2019-09-24 Thread GitBox
rdblue commented on issue #492: Should issue an error/warning message when no 
data file to delete
URL: 
https://github.com/apache/incubator-iceberg/issues/492#issuecomment-534652798
 
 
   We could add an option to StreamingDelete to validate that some data is 
deleted, if you wanted a failure instead of an idempotent delete. Other 
operations configure validations that are exposed by the base class, 
MergingSnapshotProducer. We could add a validation method here as well.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdblue commented on issue #430: Support bucket table for Iceberg

2019-09-24 Thread GitBox
rdblue commented on issue #430: Support bucket table for Iceberg
URL: 
https://github.com/apache/incubator-iceberg/issues/430#issuecomment-534654396
 
 
   @jerryshao, yes that's correct.
   
   That's why we need to expose the transformation functions to Spark via 
FunctionCatalog, and add the ability for DSv2 sources to set distribution and 
ordering requirements with those functions.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdblue commented on issue #93: [MINOR] Throw an exception early if there is no dict available instead of handling dictionary-encoded pages for absent dict

2019-09-24 Thread GitBox
rdblue commented on issue #93: [MINOR] Throw an exception early if there is no 
dict available instead of handling dictionary-encoded pages for absent dict
URL: https://github.com/apache/incubator-iceberg/pull/93#issuecomment-534786457
 
 
   Looks good to me. If `dict` returned null, further uses would fail with a 
null pointer exception. So this should be safe.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdblue commented on issue #432: Allow writers to control size of files generated

2019-09-24 Thread GitBox
rdblue commented on issue #432: Allow writers to control size of files generated
URL: https://github.com/apache/incubator-iceberg/pull/432#issuecomment-534785869
 
 
   Thanks for fixing this, @xabriel! I'll merge it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdblue merged pull request #432: Allow writers to control size of files generated

2019-09-24 Thread GitBox
rdblue merged pull request #432: Allow writers to control size of files 
generated
URL: https://github.com/apache/incubator-iceberg/pull/432
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] holdenk opened a new issue #494: TestScanSummary appears to be flaky

2019-09-24 Thread GitBox
holdenk opened a new issue #494: TestScanSummary appears to be flaky
URL: https://github.com/apache/incubator-iceberg/issues/494
 
 
   I was building Apache Icerberg for the first time and I ran into a test 
failure with `TestScanSummary`. Re-running the test with debugging information 
on, or by its self it started passing. Looking at the fix of waiting for the 
system clock to change by at least 1 tick I think there maybe needs to be more 
separation for the test not to be flaky.
   
   More generally I'm getting really confusing information from the internet of 
if the way how we're attempting to use currentTimeMillis is safe, e.g. seems 
like it may not be monotonically increasing but that might just be really old 
kernels.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] aokolnychyi commented on issue #430: Support bucket table for Iceberg

2019-09-24 Thread GitBox
aokolnychyi commented on issue #430: Support bucket table for Iceberg
URL: 
https://github.com/apache/incubator-iceberg/issues/430#issuecomment-534852783
 
 
   @dbtsai, FYI


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org