Re: ExecuteSQL Extract database tables multiple times.

Marcelo Valle Ávila Sat, 05 Mar 2016 14:48:55 -0800

Sorry, not see that you are using MS SQL Server.
I deployed a host with MS SQL and the issue reproduces too.


My enviroment:

Nifi 0.5.1
Java 7
MS SQL Server 2008

With Oracle doesn't works too, but with DB2 works perfect.


2016-03-05 22:45 GMT+01:00 Marcelo Valle Ávila <[email protected]>:

> Hello Ralf,
>
> I'm suffering the same behaviour, taking data from Oracle DB
>
> failed to process due to org.apache.avro.SchemaParseException: Empty name
>
> With NiFi 0.4.1 ExecuteSQL processor works fine, it seems that in 0.5.0
> and 0.5.1 there is some bug with Oracle databases.
>
> I test Nifi 0.5.1 processor connecting with DB2 database and works fine.
>
> What Database engine are you using?
>
> Regards!
>
>
> 2016-03-05 10:36 GMT+01:00 Ralf Meier <[email protected]>:
>
>> Hi,
>>
>> thanks Matt for clarifying things. I got it at the processor is working
>> just fine with mysql.
>> Now I tried to use it with MS SQL. But here I get some issues and could
>> not figure out why it is not working.
>>
>> My Configuration is:
>>
>> Nifi: 0.5.0
>> Java 8
>> MS SQL 2014
>>
>> DBCPConnectionPool:
>> Database Connection URL: jdbc:sqlserver://192.168.79.252:1433
>> ;databaseName=testdb
>> Class Name: com.microsoft.sqlserver.jdbc.SQLServerDriver
>> Jar Url: file:///Users/rmeier/Downloads/tmp/sqljdbc42.jar
>> Database user: sa
>> Password: *********
>>
>> In the ExecuteSQL I have the following configuration:
>> MY Connection Pooling.
>> SQL select query: select * from tuser;
>>
>> Max Wait Time: 0 seconds
>>
>> But when I run the processor I get the following error:
>>
>> 10:30:02 CET ERROR
>> ExecuteSQL[id=d32x32d7-c477-4b3b-a8b9-a77d0be27273]
>> ExecuteSQL[id=d32x32d7-c477-4b3b-a8b9-a77d0be27273] failed to process due
>> to org.apache.avro.SchemaParseException: Empty name; rolling back session:
>> org.apache.avro.SchemaParseException: Empty name
>>
>> 10:30:02 CET ERROR
>> ExecuteSQL[id=d32x32d7-c477-4b3b-a8b9-a77d0be27273] Processor
>> Administratively Yielded for 1 sec dure to processing failure
>>
>>
>> Did somebody of you have an idea how to solve this issue and what is the
>> root cause here fore?
>>
>> Thanks again for your help.
>> Ralf
>>
>>
>>
>> Am 04.03.2016 um 21:17 schrieb Matt Burgess <[email protected]>:
>>
>> Currently ExecuteSql will put all available rows into a single flow file.
>> There is a Jira case (https://issues.apache.org/jira/browse/NIFI-1251)
>> to allow the user to break up the result set into flow files containing a
>> specified number of records.
>>
>> I'm not sure why you get 26 flow files, although if you let the flow run
>> for 26 seconds you should see 26 flow files, each with the contents of the
>> "users" table. This is because it will run every second (per your config)
>> and execute the same query ("SELECT * FROM users") every time.  There is a
>> new processor in the works (
>> https://issues.apache.org/jira/browse/NIFI-1575) that will allow the
>> user to specify "maximum value columns", where the max values for each
>> specified column will be kept track of, so that each subsequent execution
>> of the processor will only retrieve rows whose values for those columns are
>> greater than the currently-held maximum value. An example would be a users
>> table with a primary key user_id, which is strictly increasing. The
>> processor would run once, fetching all available records, then unless a new
>> row is added (with a higher user_id value), no flow files will be output.
>> If rows are added in the meantime, then upon the next execution of the
>> processor, only those "new" rows will be output.
>>
>> I'm happy to help you work through this if you'd like to provide more
>> details about your table setup (columns, rows) and flow.
>>
>> Regards,
>> Matt
>>
>> On Fri, Mar 4, 2016 at 3:04 PM, Ralf Meier <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> i tried to understand the executeSQL Processor.
>>> I created a database with a table „users“. This table has two entries.
>>>
>>> The problem with the processor is that it selected multiple times the
>>> entries from the table and created altogether 26 flow files even that only
>>> two entries where available. In addition each flow file consist of the both
>>> entires.
>>>
>>> I configured the executeSQL Processor the following way:
>>> Settings: Didn’t changed anything here except of auto terminate on
>>> failure:
>>> Scheduling:
>>>         Cron based: * * * * * ? (Run every minute)
>>>         Concurrent tasks: 1
>>> Properties:
>>>         Database Connection Pooling Service: DBmysql
>>>         SQL select query: Select * from user
>>>         My Wait Time: 0 seconds
>>>
>>> Then I used a processor: convertAvroToJson and a PutFile Processor.
>>>
>>> If I runt the flow it creates 26 flow files and each of them has all
>>> entries of the tables as json included.
>>>
>>> My goal is to extract the table ones. So that the entries are only
>>> created ones as json as row not 26 times.
>>> My understanding was that each row of the table will be one flow file
>>> and therefore for each line of the table would be one json file on disk
>>> (using PutFile).
>>>
>>> But it seems that this not right. What happens if I have millions of
>>> entries in such a table? Will this be done with one flow file?
>>>
>>> How would I configure that Nifi extract the table ones?
>>>
>>> It would be great if somebody could help me with this ?
>>>
>>>
>>>  BR
>>> Ralf
>>
>>
>>
>>
>

Re: ExecuteSQL Extract database tables multiple times.

Reply via email to