[GitHub] [nifi] readl1 commented on pull request #2231: NIFI-4521 MS SQL CDC Processor

2021-03-17 Thread GitBox


readl1 commented on pull request #2231:
URL: https://github.com/apache/nifi/pull/2231#issuecomment-801118463


   @patricker 
   Ok I have found a way to reproduce this. Let me know if we can get on a 
screenshare so I can show you the issue. I have it running on a 30 second 
interval and its pull data from yesterday on each run. 
   
   SELECT max(sys.fn_cdc_map_lsn_to_time ( __$start_lsn )) FROM 
   returned: 2021-03-16 18:17:34.163
   
   Attribute in nifi: maxvalue.tran_end_time = 2021-03-16 18:17:34.163
   
   Value stored in the state: 2021-03-16 18:17:34.163
   
   How do you compare the current time stored in the state to the time coming 
from the cdc table?
   
   

   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [nifi] readl1 commented on pull request #2231: NIFI-4521 MS SQL CDC Processor

2021-03-16 Thread GitBox


readl1 commented on pull request #2231:
URL: https://github.com/apache/nifi/pull/2231#issuecomment-800749080


   What if the cdc data hasn't changed? I have a 2 hour run frequently and
   it's not uncommon to have the same data pushed 5 or 6 times.
   
   On Tue, Mar 16, 2021, 9:41 PM Peter Wicks ***@***.***> wrote:
   
   > @readl1  Hmm. State is one of the last things
   > we save, and if state save fails then we remove the whole file.
   >
   > stateManager.setState(statePropertyMap, Scope.CLUSTER);
   > session.commit();
   > } catch (IOException e) {
   > session.remove(cdcFlowFile);
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > , or
   > unsubscribe
   > 

   > .
   >
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [nifi] readl1 commented on pull request #2231: NIFI-4521 MS SQL CDC Processor

2021-03-16 Thread GitBox


readl1 commented on pull request #2231:
URL: https://github.com/apache/nifi/pull/2231#issuecomment-800354787


   > 
   > 
   > I am starting to see this processor delivery duplicate records. Has 
@patricker @ravitejatvs seem something similar? Its like the state is not being 
updated on the next run and it pulls the same data again.
   
   I wonder if a different timezone is used on the sql server compared to the 
nifi cluster there could be a loop



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [nifi] readl1 commented on pull request #2231: NIFI-4521 MS SQL CDC Processor

2021-03-15 Thread GitBox


readl1 commented on pull request #2231:
URL: https://github.com/apache/nifi/pull/2231#issuecomment-799757087


   I am starting to see this processor delivery duplicate records. Has 
@patricker @ravitejatvs seem something similar? Its like the state is not being 
updated on the next run and it pulls the same data again. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [nifi] readl1 commented on pull request #2231: NIFI-4521 MS SQL CDC Processor

2021-03-10 Thread GitBox


readl1 commented on pull request #2231:
URL: https://github.com/apache/nifi/pull/2231#issuecomment-796258426


   > 
   > 
   > @ravitejatvs @readl1 I've been trying to get binary(10) working, I spent a 
lot of time on Friday. It's not that it's actually difficult, I found a way to 
store the binary(10) value as hex. It's that my unit test framework uses Apache 
DB, and the same functions don't exist in Apache DB that I can tell. Still 
researching.
   
   Let me know if I can help at all. I have a DB where I can test this against, 
a mssql box with the values larger than bigint. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [nifi] readl1 commented on pull request #2231: NIFI-4521 MS SQL CDC Processor

2021-01-01 Thread GitBox


readl1 commented on pull request #2231:
URL: https://github.com/apache/nifi/pull/2231#issuecomment-753328778


   I believe Decimal is 10^38 not 2^38
   
   On Thu, Dec 31, 2020 at 4:50 PM Peter Wicks 
   wrote:
   
   > *@patricker* commented on this pull request.
   > --
   >
   > In
   > 
nifi-nar-bundles/nifi-cdc/nifi-cdc-mssql-bundle/nifi-cdc-mssql-processors/src/main/java/org/apache/nifi/cdc/mssql/MSSQLCDCUtils.java
   > :
   >
   > > +sbQuery.append(_columnSplit);
   > +sbQuery.append(getCURRENT_TIMESTAMP() + " EXTRACT_TIME");
   > +sbQuery.append("\n");
   > +sbQuery.append("FROM " + tableInfo.getSourceSchemaName() + ".\""+ 
tableInfo.getSourceTableName() + "\"");
   > +
   > +return sbQuery.toString();
   > +}
   > +
   > +public String getCDCSelectStatement(MSSQLTableInfo tableInfo, boolean 
includePreupdateValues, Timestamp maxTime){
   > +final StringBuilder sbQuery = new StringBuilder();
   > +
   > +sbQuery.append("SELECT t.tran_begin_time\n" +
   > +",t.tran_end_time \"tran_end_time\"\n" +
   > +",CAST(t.tran_id AS bigint) trans_id\n" +
   > +",CAST(\"o\".\"__$start_lsn\" AS bigint) start_lsn\n" +
   > +",CAST(\"o\".\"__$seqval\" AS bigint) seqval\n" +
   >
   > I was just researching this. BIGINT is actually bigger than decimal in MS
   > SQL? It looks like BIGINT actually holds up to 2^63 (-1), where as
   > DECIMAL(38) only holds up to 2^38 (-1). So... unless I'm missing something
   > I don't see a reason to move to DECIMAL. In fact, based on research, I
   > think BIGINT might be the best fit.
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > , or
   > unsubscribe
   > 

   > .
   >
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [nifi] readl1 commented on pull request #2231: NIFI-4521 MS SQL CDC Processor

2020-08-27 Thread GitBox


readl1 commented on pull request #2231:
URL: https://github.com/apache/nifi/pull/2231#issuecomment-681952830


   Sure that works!
   
   On Wed, Aug 26, 2020, 10:14 PM Peter Wicks  wrote:
   
   > @readl1  I can add those, sure. How about INFO
   > level for the timestamps, and DEBUG level for SQL Statements?
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > , or
   > unsubscribe
   > 

   > .
   >
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [nifi] readl1 commented on pull request #2231: NIFI-4521 MS SQL CDC Processor

2020-08-26 Thread GitBox


readl1 commented on pull request #2231:
URL: https://github.com/apache/nifi/pull/2231#issuecomment-681025535


   @patricker @mattyb149 Can we add some optional logging that will output the 
sql statement text so it will be easier to troubleshoot. Also log the 
timestamps. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [nifi] readl1 commented on pull request #2231: NIFI-4521 MS SQL CDC Processor

2020-07-22 Thread GitBox


readl1 commented on pull request #2231:
URL: https://github.com/apache/nifi/pull/2231#issuecomment-662501066


   @patricker @mattyb149 Noticed another issues with the cdc processor. We are 
getting duplicate records pulled from the processor with the only difference 
being the extract time. The duplicates are only happening on some inserts 
(operation 2).
   
   This is for a ~12 hour span
   
   
![image](https://user-images.githubusercontent.com/9344901/88191362-5d481b00-cc09-11ea-90fc-a23eca649362.png)
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org