[jira] [Commented] (PHOENIX-6853) Phoenix site build is broken

2024-04-01 Thread Hari Krishna Dara (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-6853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832750#comment-17832750
 ] 

Hari Krishna Dara commented on PHOENIX-6853:


Right now, either using an older Maven version (I tried 3.6.0) or 3.8.1 with 
the [mirror workaround|https://stackoverflow.com/a/67835542/95750] is not 
working. It seems like conjars.org is down and it times out while trying to 
download various site.xml files from conjars.org so had to workaround by 
creating the following empty files:

{{touch 
~/.m2/repository/org/apache/phoenix/phoenix/4.3.1/phoenix-4.3.1-site_en.xml}}
{{touch 
~/.m2/repository/org/apache/phoenix/phoenix/4.3.1/phoenix-4.3.1-site.xml}}
{{touch ~/.m2/repository/org/apache/apache/14/apache-14-site_en.xml}}
{{touch ~/.m2/repository/org/apache/apache/14/apache-14-site.xml}}

 

> Phoenix site build is broken
> 
>
> Key: PHOENIX-6853
> URL: https://issues.apache.org/jira/browse/PHOENIX-6853
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Viraj Jasani
>Assignee: Lokesh Khurana
>Priority: Major
>
> Building site requires executing build.sh (guidelines: 
> [https://phoenix.apache.org/building_website.html])
> For python3, we can use this command to start the http server locally on port 
> 8000
> {code:java}
> python3 -m http.server 8000 {code}
> The error while building site:
> {code:java}
> $ ./build.sh 
> Generate Phoenix Website
> BUILDING LANGUAGE REFERENCE
> ===
> Target: docs
> Deleting temp
> Deleting docs
> Compiling 541 classes
> Copying 1 files to temp
> Compiling 515 classes
> Copying 15 files to temp
> Compiling 35 classes
> Deleting docs
> Running org.h2.build.doc.XMLChecker
> Running org.h2.build.code.CheckTextFiles
> Running org.h2.build.doc.GenerateDoc
> Running org.h2.build.doc.WebSite
> Running org.h2.build.doc.LinkChecker
> Running org.h2.build.doc.XMLChecker
> Running org.h2.build.doc.SpellChecker
> Done in 18858 ms
> BUILDING SITE
> ===
> [INFO] Scanning for projects...
> [WARNING] 
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.phoenix:phoenix-site:jar:4.3.1
> [WARNING] Reporting configuration should be done in  section, not 
> in maven-site-plugin  as reportPlugins parameter. @ line 52, 
> column 23
> [WARNING] 
> [WARNING] It is highly recommended to fix these problems because they 
> threaten the stability of your build.
> [WARNING] 
> [WARNING] For this reason, future Maven versions might no longer support 
> building such malformed projects.
> [WARNING] 
> [INFO] 
> [INFO] --< org.apache.phoenix:phoenix-site 
> >---
> [INFO] Building Phoenix 4.3.1
> [INFO] [ jar 
> ]-
> [INFO] 
> [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ phoenix-site ---
> [INFO] 
> [INFO] --- maven-site-plugin:3.3:site (default-site) @ phoenix-site ---
> [INFO] configuring report plugin org.codehaus.mojo:findbugs-maven-plugin:2.5.2
> [INFO] Parent project loaded from repository: 
> org.apache.phoenix:phoenix:pom:4.3.1
> [INFO] Parent project loaded from repository: org.apache:apache:pom:14
> Downloading from apache release: 
> https://repository.apache.org/content/repositories/releases/org/apache/phoenix/phoenix/4.3.1/phoenix-4.3.1-site_en.xml
> Downloading from apache snapshot: 
> https://repository.apache.org/content/repositories/snapshots/org/apache/phoenix/phoenix/4.3.1/phoenix-4.3.1-site_en.xml
> Downloading from sonatype-nexus-snapshots: 
> https://oss.sonatype.org/content/repositories/snapshots/org/apache/phoenix/phoenix/4.3.1/phoenix-4.3.1-site_en.xml
> Downloading from central: 
> https://repo.maven.apache.org/maven2/org/apache/phoenix/phoenix/4.3.1/phoenix-4.3.1-site_en.xml
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time:  6.414 s
> [INFO] Finished at: 2023-01-12T10:14:39-08:00
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-site-plugin:3.3:site (default-site) on project 
> phoenix-site: SiteToolException: The site descriptor cannot be resolved from 
> the repository: ArtifactResolutionException: Unable to locate site 
> descriptor: Could not transfer artifact 
> org.apache.phoenix:phoenix:xml:site_en:4.3.1 from/to 
> maven-default-http-blocker (http://0.0.0.0/): Blocked mirror for 
> repositories: [conjars.org (http://conjars.org/repo, default, 
> releases+snapshots), apache.snapshots 
> (http://repository.apache.org/snapshots, default, snapshots)]
> [ERROR]   org.apache.phoenix:phoenix:xml:4.3.1
> [ERROR] 

[jira] (PHOENIX-7001) Change Data Capture leveraging Max Lookback and Uncovered Indexes

2024-01-01 Thread Hari Krishna Dara (Jira)


[ https://issues.apache.org/jira/browse/PHOENIX-7001 ]


Hari Krishna Dara deleted comment on PHOENIX-7001:


was (Author: haridsv):
Resolved wrong item.

> Change Data Capture leveraging Max Lookback and Uncovered Indexes
> -
>
> Key: PHOENIX-7001
> URL: https://issues.apache.org/jira/browse/PHOENIX-7001
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Kadir Ozdemir
>Priority: Major
>
> The use cases for a Change Data Capture (CDC) feature are centered around 
> capturing changes to a given table (or updatable view) as these changes 
> happen in near real-time. A CDC application can retrieve changes in real-time 
> or with some delay, or even retrieves the same set of changes multiple times. 
> This means the CDC use case can be generalized as time range queries where 
> the time range is typically short such as last x minutes or hours or 
> expressed as a specific time range in the last n days where n is typically 
> less than 7.
> A change is an update in a row. That is, a change is either updating one or 
> more columns of a table for a given row or deleting a row. It is desirable to 
> provide these changes in the order of their arrival. One can visualize the 
> delivery of these changes through a stream from a Phoenix table to the 
> application that is initiated by the application similar to the delivery of 
> any other Phoenix query results. The difference is that a regular query 
> result includes at most one result row for each row satisfying the query and 
> the deleted rows are not visible to the query result while the CDC 
> stream/result can include multiple result rows for each row and the result 
> includes deleted rows. Some use cases need to also get the pre and/or post 
> image of the row along with a change on the row. 
> The design proposed here leverages Phoenix Max Lookback and Uncovered (Global 
> or Local) Indexes. The max lookback feature retains recent changes to a 
> table, that is, the changes that have been done in the last x days typically. 
> This means that the max lookback feature already captures the changes to a 
> given table. Currently, the max lookback age is configurable at the cluster 
> level. We need to extend this capability to be able to configure the max 
> lookback age at the table level so that each table can have a different max 
> lookback age based on its CDC application requirements.
> To deliver the changes in the order of their arrival, we need a time based 
> index. This index should be uncovered as the changes are already retained in 
> the table by the max lookback feature. The arrival time can be defined as the 
> mutation timestamp generated by the server, or a user-specified timestamp (or 
> any other long integer) column. An uncovered index would allow us to 
> efficiently and orderly access to the changes. Changes to an index table are 
> also preserved by the max lookback feature.
> A CDC feature can be composed of the following components:
>  * {*}CDCUncoveredIndexRegionScanner{*}: This is a server side scanner on an 
> uncovered index used for CDC. This can inherit UncoveredIndexRegionScanner. 
> It goes through index table rows using a raw scan to identify data table rows 
> and retrieves these rows using a raw scan. Using the time range, it forms a 
> JSON blob to represent changes to the row including pre and/or post row 
> images.
>  * {*}CDC Query Compiler{*}: This is a client side component. It prepares the 
> scan object based on the given CDC query statement. 
>  * {*}CDC DDL Compiler{*}: This is a client side component. It creates the 
> time based uncovered (global/local) index based on the given CDC DDL 
> statement and a virtual table of CDC type. CDC will be a new table type. 
> A CDC DDL syntax to create CDC on a (data) table can be as follows: 
> Create CDC  on  (PHOENIX_ROW_TIMESTAMP()  | 
> ) INCLUDE (pre | post | latest | all) TTL =  seconds> INDEX =  SALT_BUCKETS=
> The above CDC DDL creates a virtual CDC table and an uncovered index. The CDC 
> table PK columns start with the timestamp or user defined column and continue 
> with the data table PK columns. The CDC table includes one non-PK column 
> which is a JSON column. The change is expressed in this JSON column in 
> multiple ways based on the CDC DDL or query statement. The change can be 
> expressed as just the mutation for the change, the latest image of the row, 
> the pre image of the row (the image before the change), the post image, or 
> any combination of these. The CDC table is not a physical table on disk. It 
> is just a virtual table to be used in a CDC query. Phoenix stores just the 
> metadata for this virtual table. 
> A CDC query can be as follow:
> Select * from  where PHOENIX_ROW_TIMESTAMP() >= TO_DATE( …) 
> AND PHOENIX_ROW_TIMESTAMP() < TO_DATE( 

[jira] [Commented] (PHOENIX-7015) Extend UncoveredGlobalIndexRegionScanner for CDC region scanner usecase

2024-01-01 Thread Hari Krishna Dara (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801515#comment-17801515
 ] 

Hari Krishna Dara commented on PHOENIX-7015:


Some PoC changes have been included in this PR: 
https://github.com/apache/phoenix/pull/1766

> Extend UncoveredGlobalIndexRegionScanner for CDC region scanner usecase
> ---
>
> Key: PHOENIX-7015
> URL: https://issues.apache.org/jira/browse/PHOENIX-7015
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Viraj Jasani
>Priority: Major
>
> For CDC region scanner usecase, extend UncoveredGlobalIndexRegionScanner to 
> CDCUncoveredGlobalIndexRegionScanner. The new region scanner for CDC performs 
> raw scan to index table and retrieve data table rows from index rows.
> Using the time range, it can form a JSON blob to represent changes to the row 
> including pre and/or post row images.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (PHOENIX-7001) Change Data Capture leveraging Max Lookback and Uncovered Indexes

2024-01-01 Thread Hari Krishna Dara (Jira)


[ https://issues.apache.org/jira/browse/PHOENIX-7001 ]


Hari Krishna Dara deleted comment on PHOENIX-7001:


was (Author: haridsv):
PR: https://github.com/apache/phoenix/pull/1766

> Change Data Capture leveraging Max Lookback and Uncovered Indexes
> -
>
> Key: PHOENIX-7001
> URL: https://issues.apache.org/jira/browse/PHOENIX-7001
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Kadir Ozdemir
>Priority: Major
>
> The use cases for a Change Data Capture (CDC) feature are centered around 
> capturing changes to a given table (or updatable view) as these changes 
> happen in near real-time. A CDC application can retrieve changes in real-time 
> or with some delay, or even retrieves the same set of changes multiple times. 
> This means the CDC use case can be generalized as time range queries where 
> the time range is typically short such as last x minutes or hours or 
> expressed as a specific time range in the last n days where n is typically 
> less than 7.
> A change is an update in a row. That is, a change is either updating one or 
> more columns of a table for a given row or deleting a row. It is desirable to 
> provide these changes in the order of their arrival. One can visualize the 
> delivery of these changes through a stream from a Phoenix table to the 
> application that is initiated by the application similar to the delivery of 
> any other Phoenix query results. The difference is that a regular query 
> result includes at most one result row for each row satisfying the query and 
> the deleted rows are not visible to the query result while the CDC 
> stream/result can include multiple result rows for each row and the result 
> includes deleted rows. Some use cases need to also get the pre and/or post 
> image of the row along with a change on the row. 
> The design proposed here leverages Phoenix Max Lookback and Uncovered (Global 
> or Local) Indexes. The max lookback feature retains recent changes to a 
> table, that is, the changes that have been done in the last x days typically. 
> This means that the max lookback feature already captures the changes to a 
> given table. Currently, the max lookback age is configurable at the cluster 
> level. We need to extend this capability to be able to configure the max 
> lookback age at the table level so that each table can have a different max 
> lookback age based on its CDC application requirements.
> To deliver the changes in the order of their arrival, we need a time based 
> index. This index should be uncovered as the changes are already retained in 
> the table by the max lookback feature. The arrival time can be defined as the 
> mutation timestamp generated by the server, or a user-specified timestamp (or 
> any other long integer) column. An uncovered index would allow us to 
> efficiently and orderly access to the changes. Changes to an index table are 
> also preserved by the max lookback feature.
> A CDC feature can be composed of the following components:
>  * {*}CDCUncoveredIndexRegionScanner{*}: This is a server side scanner on an 
> uncovered index used for CDC. This can inherit UncoveredIndexRegionScanner. 
> It goes through index table rows using a raw scan to identify data table rows 
> and retrieves these rows using a raw scan. Using the time range, it forms a 
> JSON blob to represent changes to the row including pre and/or post row 
> images.
>  * {*}CDC Query Compiler{*}: This is a client side component. It prepares the 
> scan object based on the given CDC query statement. 
>  * {*}CDC DDL Compiler{*}: This is a client side component. It creates the 
> time based uncovered (global/local) index based on the given CDC DDL 
> statement and a virtual table of CDC type. CDC will be a new table type. 
> A CDC DDL syntax to create CDC on a (data) table can be as follows: 
> Create CDC  on  (PHOENIX_ROW_TIMESTAMP()  | 
> ) INCLUDE (pre | post | latest | all) TTL =  seconds> INDEX =  SALT_BUCKETS=
> The above CDC DDL creates a virtual CDC table and an uncovered index. The CDC 
> table PK columns start with the timestamp or user defined column and continue 
> with the data table PK columns. The CDC table includes one non-PK column 
> which is a JSON column. The change is expressed in this JSON column in 
> multiple ways based on the CDC DDL or query statement. The change can be 
> expressed as just the mutation for the change, the latest image of the row, 
> the pre image of the row (the image before the change), the post image, or 
> any combination of these. The CDC table is not a physical table on disk. It 
> is just a virtual table to be used in a CDC query. Phoenix stores just the 
> metadata for this virtual table. 
> A CDC query can be as follow:
> Select * from  where PHOENIX_ROW_TIMESTAMP() >= TO_DATE( …) 
> AND 

[jira] (PHOENIX-7001) Change Data Capture leveraging Max Lookback and Uncovered Indexes

2024-01-01 Thread Hari Krishna Dara (Jira)


[ https://issues.apache.org/jira/browse/PHOENIX-7001 ]


Hari Krishna Dara deleted comment on PHOENIX-7001:


was (Author: haridsv):
Change merged into the feature branch.

> Change Data Capture leveraging Max Lookback and Uncovered Indexes
> -
>
> Key: PHOENIX-7001
> URL: https://issues.apache.org/jira/browse/PHOENIX-7001
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Kadir Ozdemir
>Priority: Major
>
> The use cases for a Change Data Capture (CDC) feature are centered around 
> capturing changes to a given table (or updatable view) as these changes 
> happen in near real-time. A CDC application can retrieve changes in real-time 
> or with some delay, or even retrieves the same set of changes multiple times. 
> This means the CDC use case can be generalized as time range queries where 
> the time range is typically short such as last x minutes or hours or 
> expressed as a specific time range in the last n days where n is typically 
> less than 7.
> A change is an update in a row. That is, a change is either updating one or 
> more columns of a table for a given row or deleting a row. It is desirable to 
> provide these changes in the order of their arrival. One can visualize the 
> delivery of these changes through a stream from a Phoenix table to the 
> application that is initiated by the application similar to the delivery of 
> any other Phoenix query results. The difference is that a regular query 
> result includes at most one result row for each row satisfying the query and 
> the deleted rows are not visible to the query result while the CDC 
> stream/result can include multiple result rows for each row and the result 
> includes deleted rows. Some use cases need to also get the pre and/or post 
> image of the row along with a change on the row. 
> The design proposed here leverages Phoenix Max Lookback and Uncovered (Global 
> or Local) Indexes. The max lookback feature retains recent changes to a 
> table, that is, the changes that have been done in the last x days typically. 
> This means that the max lookback feature already captures the changes to a 
> given table. Currently, the max lookback age is configurable at the cluster 
> level. We need to extend this capability to be able to configure the max 
> lookback age at the table level so that each table can have a different max 
> lookback age based on its CDC application requirements.
> To deliver the changes in the order of their arrival, we need a time based 
> index. This index should be uncovered as the changes are already retained in 
> the table by the max lookback feature. The arrival time can be defined as the 
> mutation timestamp generated by the server, or a user-specified timestamp (or 
> any other long integer) column. An uncovered index would allow us to 
> efficiently and orderly access to the changes. Changes to an index table are 
> also preserved by the max lookback feature.
> A CDC feature can be composed of the following components:
>  * {*}CDCUncoveredIndexRegionScanner{*}: This is a server side scanner on an 
> uncovered index used for CDC. This can inherit UncoveredIndexRegionScanner. 
> It goes through index table rows using a raw scan to identify data table rows 
> and retrieves these rows using a raw scan. Using the time range, it forms a 
> JSON blob to represent changes to the row including pre and/or post row 
> images.
>  * {*}CDC Query Compiler{*}: This is a client side component. It prepares the 
> scan object based on the given CDC query statement. 
>  * {*}CDC DDL Compiler{*}: This is a client side component. It creates the 
> time based uncovered (global/local) index based on the given CDC DDL 
> statement and a virtual table of CDC type. CDC will be a new table type. 
> A CDC DDL syntax to create CDC on a (data) table can be as follows: 
> Create CDC  on  (PHOENIX_ROW_TIMESTAMP()  | 
> ) INCLUDE (pre | post | latest | all) TTL =  seconds> INDEX =  SALT_BUCKETS=
> The above CDC DDL creates a virtual CDC table and an uncovered index. The CDC 
> table PK columns start with the timestamp or user defined column and continue 
> with the data table PK columns. The CDC table includes one non-PK column 
> which is a JSON column. The change is expressed in this JSON column in 
> multiple ways based on the CDC DDL or query statement. The change can be 
> expressed as just the mutation for the change, the latest image of the row, 
> the pre image of the row (the image before the change), the post image, or 
> any combination of these. The CDC table is not a physical table on disk. It 
> is just a virtual table to be used in a CDC query. Phoenix stores just the 
> metadata for this virtual table. 
> A CDC query can be as follow:
> Select * from  where PHOENIX_ROW_TIMESTAMP() >= TO_DATE( …) 
> AND 

[jira] [Commented] (PHOENIX-7001) Change Data Capture leveraging Max Lookback and Uncovered Indexes

2024-01-01 Thread Hari Krishna Dara (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801509#comment-17801509
 ] 

Hari Krishna Dara commented on PHOENIX-7001:


PR: https://github.com/apache/phoenix/pull/1766

> Change Data Capture leveraging Max Lookback and Uncovered Indexes
> -
>
> Key: PHOENIX-7001
> URL: https://issues.apache.org/jira/browse/PHOENIX-7001
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Kadir Ozdemir
>Priority: Major
>
> The use cases for a Change Data Capture (CDC) feature are centered around 
> capturing changes to a given table (or updatable view) as these changes 
> happen in near real-time. A CDC application can retrieve changes in real-time 
> or with some delay, or even retrieves the same set of changes multiple times. 
> This means the CDC use case can be generalized as time range queries where 
> the time range is typically short such as last x minutes or hours or 
> expressed as a specific time range in the last n days where n is typically 
> less than 7.
> A change is an update in a row. That is, a change is either updating one or 
> more columns of a table for a given row or deleting a row. It is desirable to 
> provide these changes in the order of their arrival. One can visualize the 
> delivery of these changes through a stream from a Phoenix table to the 
> application that is initiated by the application similar to the delivery of 
> any other Phoenix query results. The difference is that a regular query 
> result includes at most one result row for each row satisfying the query and 
> the deleted rows are not visible to the query result while the CDC 
> stream/result can include multiple result rows for each row and the result 
> includes deleted rows. Some use cases need to also get the pre and/or post 
> image of the row along with a change on the row. 
> The design proposed here leverages Phoenix Max Lookback and Uncovered (Global 
> or Local) Indexes. The max lookback feature retains recent changes to a 
> table, that is, the changes that have been done in the last x days typically. 
> This means that the max lookback feature already captures the changes to a 
> given table. Currently, the max lookback age is configurable at the cluster 
> level. We need to extend this capability to be able to configure the max 
> lookback age at the table level so that each table can have a different max 
> lookback age based on its CDC application requirements.
> To deliver the changes in the order of their arrival, we need a time based 
> index. This index should be uncovered as the changes are already retained in 
> the table by the max lookback feature. The arrival time can be defined as the 
> mutation timestamp generated by the server, or a user-specified timestamp (or 
> any other long integer) column. An uncovered index would allow us to 
> efficiently and orderly access to the changes. Changes to an index table are 
> also preserved by the max lookback feature.
> A CDC feature can be composed of the following components:
>  * {*}CDCUncoveredIndexRegionScanner{*}: This is a server side scanner on an 
> uncovered index used for CDC. This can inherit UncoveredIndexRegionScanner. 
> It goes through index table rows using a raw scan to identify data table rows 
> and retrieves these rows using a raw scan. Using the time range, it forms a 
> JSON blob to represent changes to the row including pre and/or post row 
> images.
>  * {*}CDC Query Compiler{*}: This is a client side component. It prepares the 
> scan object based on the given CDC query statement. 
>  * {*}CDC DDL Compiler{*}: This is a client side component. It creates the 
> time based uncovered (global/local) index based on the given CDC DDL 
> statement and a virtual table of CDC type. CDC will be a new table type. 
> A CDC DDL syntax to create CDC on a (data) table can be as follows: 
> Create CDC  on  (PHOENIX_ROW_TIMESTAMP()  | 
> ) INCLUDE (pre | post | latest | all) TTL =  seconds> INDEX =  SALT_BUCKETS=
> The above CDC DDL creates a virtual CDC table and an uncovered index. The CDC 
> table PK columns start with the timestamp or user defined column and continue 
> with the data table PK columns. The CDC table includes one non-PK column 
> which is a JSON column. The change is expressed in this JSON column in 
> multiple ways based on the CDC DDL or query statement. The change can be 
> expressed as just the mutation for the change, the latest image of the row, 
> the pre image of the row (the image before the change), the post image, or 
> any combination of these. The CDC table is not a physical table on disk. It 
> is just a virtual table to be used in a CDC query. Phoenix stores just the 
> metadata for this virtual table. 
> A CDC query can be as follow:
> Select * from 

[jira] [Comment Edited] (PHOENIX-6821) Batching with auto-commit connections

2022-12-06 Thread Hari Krishna Dara (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643844#comment-17643844
 ] 

Hari Krishna Dara edited comment on PHOENIX-6821 at 12/7/22 5:31 AM:
-

While trying to understand the current implementation, I noticed one thing odd 
and I am proposing that we fix it.  The issue is that JDBC batch API seems to 
be only meant for DML or DDL statements (i.e, _not_ meant for DQL). While it 
doesn’t say as clearly, here how is how I came to the conclusion:
 * The 
[javadoc|https://docs.oracle.com/javase/8/docs/api/java/sql/Statement.html#addBatch-java.lang.String-]
 for {{addBatch}} says {_}“typically this is a SQL INSERT or UPDATE 
statement”{_}, This implies “typically this is not a SELECT statement” which is 
slightly vague and leaves some scope for interpretation. However,
 * if you look at the {{executeBatch}} API, you would realize that there is no 
provision to return a {{{}ResultSet{}}}. The return value is an {{int[]}} to 
indicate the update counts from each of the batches, so it seems DQL was not 
considered.
 * I also tried a quick experiment. I took a sqlite JDBC batch API sample and 
inserted a batch for SELECT statement and got the error 
{{{}java.sql.BatchUpdateException: batch entry 2: query returns results{}}}, 
which means the driver actively detects if any statement is returning a RS and 
flags it as an error.
 * I then repeated the same experiment on MySQL and got the exception 
{{java.sql.BatchUpdateException: Statement.executeUpdate() or 
Statement.executeLargeUpdate() cannot issue statements that produce result 
sets.}}

 
I am not against supporting this in phoenix sort of like an “extension”, but I 
have a few concerns:
 # As I mentioned above, the batch API itself doesn’t provide a way to access 
resultsets, but this feature may still be usable via {{getResultSets}} on 
statement. However, this includes resultsets for any DMLs executed ahead of the 
batch using the same statement object that are still open, so it can be quite 
unwieldy and can even be misleading in some situations.
 # Even if we are willing to live with the above limitations, since the JDBC 
doc itself doesn’t talk about this ability (in fact I can’t find any reference 
to this such as blogs and forum posts via Google), even if we continue to 
support this, it is unlikely that anyone would actually make use of this.

Considering all the above context, I think we should stop supporting DQL and 
flag their usage as error like Sqlite and MySQL drivers do.
 


was (Author: haridsv):
While trying to understand the current implementation, I noticed one thing odd 
and I am proposing that we fix it.  The issue is that JDBC batch API seems to 
be only meant for DML or DDL statements (i.e, _not_ meant for DQL). While it 
doesn’t say as clearly, here how is how I came to the conclusion:
 * The 
[javadoc|https://docs.oracle.com/javase/8/docs/api/java/sql/Statement.html#addBatch-java.lang.String-]
 for {{addBatch}} says {_}“typically this is a SQL INSERT or UPDATE 
statement”{_}, This implies “typically this is not a SELECT statement” which is 
slightly vague and leaves some scope for interpretation. However,
 * if you look at the {{executeBatch}} API, you would realize that there is no 
provision to return a {{{}ResultSet{}}}. The return value is an {{int[]}} to 
indicate the update counts from each of the batches, so it seems DQL was not 
considered.
 * I also tried a quick experiment. I took a sqlite JDBC batch API sample and 
inserted a batch for SELECT statement and got the error 
{{{}java.sql.BatchUpdateException: batch entry 2: query returns results{}}}, 
which means the driver actively detects if any statement is returning a RS and 
flags it as an error.
 * I then repeated the same experiment on MySQL and got the exception 
{{java.sql.BatchUpdateException: Statement.executeUpdate() or 
Statement.executeLargeUpdate() cannot issue statements that produce result 
sets.}}

 
I am not against supporting this in phoenix sort of like an “extension”, but I 
have a few concerns: # As I mentioned above, the batch API itself doesn’t 
provide a way to access resultsets, but this feature may still be usable via 
{{getResultSets}} on statement. However, this includes resultsets for any DMLs 
executed ahead of the batch using the same statement object that are still 
open, so it can be quite unwieldy and can even be misleading in some situations.
 # Even if we are willing to live with the above limitations, since the JDBC 
doc itself doesn’t talk about this ability (in fact I can’t find any reference 
to this such as blogs and forum posts via Google), even if we continue to 
support this, it is unlikely that anyone would actually make use of this.

Considering all the above context, I think we should stop supporting DQL and 
flag their usage as error like Sqlite and MySQL drivers do.
 

> 

[jira] [Comment Edited] (PHOENIX-6821) Batching with auto-commit connections

2022-12-06 Thread Hari Krishna Dara (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643844#comment-17643844
 ] 

Hari Krishna Dara edited comment on PHOENIX-6821 at 12/6/22 11:48 AM:
--

While trying to understand the current implementation, I noticed one thing odd 
and I am proposing that we fix it.  The issue is that JDBC batch API seems to 
be only meant for DML or DDL statements (i.e, _not_ meant for DQL). While it 
doesn’t say as clearly, here how is how I came to the conclusion:
 * The 
[javadoc|https://docs.oracle.com/javase/8/docs/api/java/sql/Statement.html#addBatch-java.lang.String-]
 for {{addBatch}} says {_}“typically this is a SQL INSERT or UPDATE 
statement”{_}, This implies “typically this is not a SELECT statement” which is 
slightly vague and leaves some scope for interpretation. However,
 * if you look at the {{executeBatch}} API, you would realize that there is no 
provision to return a {{{}ResultSet{}}}. The return value is an {{int[]}} to 
indicate the update counts from each of the batches, so it seems DQL was not 
considered.
 * I also tried a quick experiment. I took a sqlite JDBC batch API sample and 
inserted a batch for SELECT statement and got the error 
{{{}java.sql.BatchUpdateException: batch entry 2: query returns results{}}}, 
which means the driver actively detects if any statement is returning a RS and 
flags it as an error.
 * I then repeated the same experiment on MySQL and got the exception 
{{java.sql.BatchUpdateException: Statement.executeUpdate() or 
Statement.executeLargeUpdate() cannot issue statements that produce result 
sets.}}

 
I am not against supporting this in phoenix sort of like an “extension”, but I 
have a few concerns: # As I mentioned above, the batch API itself doesn’t 
provide a way to access resultsets, but this feature may still be usable via 
{{getResultSets}} on statement. However, this includes resultsets for any DMLs 
executed ahead of the batch using the same statement object that are still 
open, so it can be quite unwieldy and can even be misleading in some situations.
 # Even if we are willing to live with the above limitations, since the JDBC 
doc itself doesn’t talk about this ability (in fact I can’t find any reference 
to this such as blogs and forum posts via Google), even if we continue to 
support this, it is unlikely that anyone would actually make use of this.

Considering all the above context, I think we should stop supporting DQL and 
flag their usage as error like Sqlite and MySQL drivers do.
 


was (Author: haridsv):
While trying to understand the current implementation, I noticed one thing odd 
and I am proposing that we fix it.  The issue is that JDBC batch API seems to 
be only meant for DML or DDL statements (i.e, _not_ meant for DQL). While it 
doesn’t say as clearly, here how is how I came to the conclusion:
 * The 
[javadoc|https://docs.oracle.com/javase/8/docs/api/java/sql/Statement.html#addBatch-java.lang.String-]
 for {{addBatch}} says {_}“typically this is a SQL INSERT or UPDATE 
statement”{_}, This implies “typically this is not a SELECT statement” which is 
slightly vague and leaves some scope for interpretation. However,
 * if you look at the {{executeBatch}} API, you would realize that there is no 
provision to return a {{{}ResultSet{}}}. The return value is an {{int[]}} to 
indicate the update counts from each of the batches, so it seems DQL was not 
considered.
 * I also tried a quick experiment. I took a sqlite JDBC batch API sample and 
inserted a batch for SELECT statement and got the error 
{{{}java.sql.BatchUpdateException: batch entry 2: query returns results{}}}, 
which means the driver actively detects if any statement is returning a RS and 
flags it as an error.
 * I then repeated the same experiment on MySQL and got the exception 
{{java.sql.BatchUpdateException: Statement.executeUpdate() or 
Statement.executeLargeUpdate() cannot issue statements that produce result 
sets.}}

 
I am not against supporting this in phoenix sort of like an “extension”, but I 
have a few concerns: # As I mentioned above, the batch API itself doesn’t 
provide a way to access resultsets, but this feature may still be usable via 
{{getResultSets}} on statement. However, this includes resultsets for any DMLs 
executed ahead of the batch using the same statement object that are still 
open, so it can be quite unwieldy and can even be misleading in some situations.
 # Even if we are willing to live with the above limitations, since the JDBC 
doc itself doesn’t talk about this ability (in fact I can’t find any reference 
to this such as blogs and forum posts via Google), even if we continue to 
support this, it is unlikely that anyone would actually make use of this.

Considering all the above context, I think if we should stop supporting DQL and 
flag their usage as error like Sqlite and MySQL drivers do.
 


[jira] [Commented] (PHOENIX-6821) Batching with auto-commit connections

2022-12-06 Thread Hari Krishna Dara (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643844#comment-17643844
 ] 

Hari Krishna Dara commented on PHOENIX-6821:


While trying to understand the current implementation, I noticed one thing odd 
and I am proposing that we fix it.  The issue is that JDBC batch API seems to 
be only meant for DML or DDL statements (i.e, _not_ meant for DQL). While it 
doesn’t say as clearly, here how is how I came to the conclusion:
 * The 
[javadoc|https://docs.oracle.com/javase/8/docs/api/java/sql/Statement.html#addBatch-java.lang.String-]
 for {{addBatch}} says {_}“typically this is a SQL INSERT or UPDATE 
statement”{_}, This implies “typically this is not a SELECT statement” which is 
slightly vague and leaves some scope for interpretation. However,
 * if you look at the {{executeBatch}} API, you would realize that there is no 
provision to return a {{{}ResultSet{}}}. The return value is an {{int[]}} to 
indicate the update counts from each of the batches, so it seems DQL was not 
considered.
 * I also tried a quick experiment. I took a sqlite JDBC batch API sample and 
inserted a batch for SELECT statement and got the error 
{{{}java.sql.BatchUpdateException: batch entry 2: query returns results{}}}, 
which means the driver actively detects if any statement is returning a RS and 
flags it as an error.
 * I then repeated the same experiment on MySQL and got the exception 
{{java.sql.BatchUpdateException: Statement.executeUpdate() or 
Statement.executeLargeUpdate() cannot issue statements that produce result 
sets.}}

 
I am not against supporting this in phoenix sort of like an “extension”, but I 
have a few concerns: # As I mentioned above, the batch API itself doesn’t 
provide a way to access resultsets, but this feature may still be usable via 
{{getResultSets}} on statement. However, this includes resultsets for any DMLs 
executed ahead of the batch using the same statement object that are still 
open, so it can be quite unwieldy and can even be misleading in some situations.
 # Even if we are willing to live with the above limitations, since the JDBC 
doc itself doesn’t talk about this ability (in fact I can’t find any reference 
to this such as blogs and forum posts via Google), even if we continue to 
support this, it is unlikely that anyone would actually make use of this.

Considering all the above context, I think if we should stop supporting DQL and 
flag their usage as error like Sqlite and MySQL drivers do.
 

> Batching with auto-commit connections
> -
>
> Key: PHOENIX-6821
> URL: https://issues.apache.org/jira/browse/PHOENIX-6821
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Kadir Ozdemir
>Assignee: Hari Krishna Dara
>Priority: Major
>
> Phoenix commits the commands of a batch individually when executeBatch() is 
> called if auto commit is enabled on the connection.  For example, if a batch 
> of 100 upsert statements is created using addBatch() within an auto-commit 
> mode connection then when executeBatch() is called, Phoenix creates 100 HBase 
> batches each with a single mutation, i.e., one for each upsert. This defeats 
> the purpose of batching. The correct behavior is to commit the entire batch 
> of upsert statements using the minimum number of HBase batches. This means if 
> the entire batch of upsert statements fits in a single HBase batch, then one 
> HBase batch should be used.
> Please note for connections without auto-commit, Phoenix behaves correctly, 
> that is, the entire batch of upsert commands is committed using the minimum 
> number of HBase batches.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)