Re: [PR] Remove redundant getSnapshotAt calls per commit [incubator-xtable]
the-other-tim-brown commented on PR #791: URL: https://github.com/apache/incubator-xtable/pull/791#issuecomment-3937667388 Thanks @brishi19791 for the optimizaiton! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] Remove redundant getSnapshotAt calls per commit [incubator-xtable]
the-other-tim-brown merged PR #791: URL: https://github.com/apache/incubator-xtable/pull/791 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] Remove redundant getSnapshotAt calls per commit [incubator-xtable]
brishi19791 commented on code in PR #791:
URL: https://github.com/apache/incubator-xtable/pull/791#discussion_r2804569193
##
xtable-core/src/main/java/org/apache/xtable/delta/DeltaTableExtractor.java:
##
@@ -44,6 +44,10 @@ public class DeltaTableExtractor {
public InternalTable table(DeltaLog deltaLog, String tableName, Long
version) {
Snapshot snapshot = deltaLog.getSnapshotAt(version, Option.empty());
+return table(snapshot, tableName);
+ }
+
+ public InternalTable table(Snapshot snapshot, String tableName) {
Review Comment:
thanks for the feedback Ashvin. added javadoc and validator
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] Remove redundant getSnapshotAt calls per commit [incubator-xtable]
ashvina commented on code in PR #791:
URL: https://github.com/apache/incubator-xtable/pull/791#discussion_r2767163196
##
xtable-core/src/main/java/org/apache/xtable/delta/DeltaTableExtractor.java:
##
@@ -44,6 +44,10 @@ public class DeltaTableExtractor {
public InternalTable table(DeltaLog deltaLog, String tableName, Long
version) {
Snapshot snapshot = deltaLog.getSnapshotAt(version, Option.empty());
+return table(snapshot, tableName);
+ }
+
+ public InternalTable table(Snapshot snapshot, String tableName) {
Review Comment:
Consider adding a brief Javadoc on the new public api `table(Snapshot,
String)` explaining assumptions (`snapshot` must be valid and correspond to
`tableName`’s table).
Or, add a input validator
```
private static Snapshot requireMetadata(Snapshot s, String tableName) {
if (s == null || s.metadata() == null || s.metadata().schema() == null) {
throw new IllegalStateException("Missing metadata/schema for table: " +
tableName
+ " at version: " + (s != null ? s.version() : "unknown"));
}
return s;
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] Remove redundant getSnapshotAt calls per commit [incubator-xtable]
brishi19791 commented on code in PR #791:
URL: https://github.com/apache/incubator-xtable/pull/791#discussion_r2754967531
##
xtable-core/src/main/java/org/apache/xtable/delta/DeltaConversionSource.java:
##
@@ -85,16 +85,15 @@ public InternalTable getTable(Long version) {
@Override
public InternalTable getCurrentTable() {
-DeltaLog deltaLog = DeltaLog.forTable(sparkSession, basePath);
Snapshot snapshot = deltaLog.snapshot();
-return getTable(snapshot.version());
+return tableExtractor.table(snapshot, tableName);
}
@Override
public InternalSnapshot getCurrentSnapshot() {
DeltaLog deltaLog = DeltaLog.forTable(sparkSession, basePath);
Review Comment:
yes
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] Remove redundant getSnapshotAt calls per commit [incubator-xtable]
kevinjqliu commented on code in PR #791:
URL: https://github.com/apache/incubator-xtable/pull/791#discussion_r2752061685
##
xtable-core/src/main/java/org/apache/xtable/delta/DeltaConversionSource.java:
##
@@ -85,16 +85,15 @@ public InternalTable getTable(Long version) {
@Override
public InternalTable getCurrentTable() {
-DeltaLog deltaLog = DeltaLog.forTable(sparkSession, basePath);
Snapshot snapshot = deltaLog.snapshot();
-return getTable(snapshot.version());
+return tableExtractor.table(snapshot, tableName);
}
@Override
public InternalSnapshot getCurrentSnapshot() {
DeltaLog deltaLog = DeltaLog.forTable(sparkSession, basePath);
Review Comment:
```suggestion
```
not needed
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] Remove redundant getSnapshotAt calls per commit [incubator-xtable]
kevinjqliu commented on PR #791: URL: https://github.com/apache/incubator-xtable/pull/791#issuecomment-3831985828 > We now fetch the snapshot once per commit/version and reuse it to construct the InternalTable (via a DeltaTableExtractor.table(Snapshot, tableName) overload), instead of re-resolving the same snapshot multiple times. Thats awesome, thanks for adding the perf improvement. It might be a good idea to add a test (either here or a follow up PR) to verify that the entire conversion process should only read the DeltaLog once. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
[PR] Remove redundant getSnapshotAt calls per commit [incubator-xtable]
brishi19791 opened a new pull request, #791: URL: https://github.com/apache/incubator-xtable/pull/791 This PR removes redundant DeltaLog.getSnapshotAt(version) calls in the Delta source conversion path that were happening for every commit. getSnapshotAt can internally trigger an expensive Spark job and associated network I/O (e.g., listing/reading Delta log metadata from remote storage) to resolve the snapshot for a given version. We now fetch the snapshot once per commit/version and reuse it to construct the InternalTable (via a DeltaTableExtractor.table(Snapshot, tableName) overload), instead of re-resolving the same snapshot multiple times. Impact - Avoids redundant snapshot resolution work per commit/version (and the Spark job + network calls it may trigger). - Reduces end-to-end conversion latency, especially for large commit backlogs. - No intended functional behavior change; performance optimization only. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
