Re: [BioMart Users] Joining datasets: 0.8 rc6

Tony R Miqueli Mon, 28 Nov 2011 10:13:34 -0800

That did it.  Thank you all so, so much!

Tony


-----Original Message-----
From: Junjun Zhang [mailto:[email protected]] 
Sent: Monday, November 28, 2011 11:14 AM
To: Tony R Miqueli; [email protected]
Subject: Re: [BioMart Users] Joining datasets: 0.8 rc6

Hi Tony,

In the current implementation, the join is done as you described, no
special handling is given to Oracle.

One way may get around that is to change the batch size so that it's less
than 1000. Find the following line in QueryRunner.java:

public static final int BATCH_SIZE = 5000;

Change it to say 800. What it does is that the second query will receive
up to 800 rows of result from the first query, and then add them in the IN
list of the second query.

Hope this helps,

Junjun



From:  Tony R Miqueli <[email protected]>
Date:  Sun, 27 Nov 2011 22:49:48 -0500
To:  "[email protected]" <[email protected]>
Subject:  [BioMart Users] Joining datasets:  0.8 rc6


>I am trying to setup a biomart 0.8 instance that has datasources in two
>separate Oracle databases.   Each database instance has just one table
>with two columns, a unique identifier and varchar column.  The unique
>identifier columns are
> the link between the two datasets.  I am trying to setup a simple join
>between these two tables.  According the rc6 documentation,  it appears
>that creating a pointer attribute would create this link and effectively
>Œinner join¹ these to datasets at runtime.
> I¹m able to create the pointer attribute in the target access point,
>setup the datasource link, start the webserver and attempt to run the
>query.  From the web interface I see the 3 attributes (the shared unique
>ID, varchar column from datasource ŒA¹, varchar
> column from datasource ŒB¹), select them and run the queryŠwhich errors
>out.  I ran this query through the Java API in Eclipse and while
>debugging noticed that an exception is thrown when trying to query the
>second datasource:
>ORA-01795: maximum number of expressions in a list is 1000.
> 
>Stepping through the code in the debugger, it appears that it¹s running a
>query to get all of the results from datasource
> ŒA¹, and then turning the ID column into a list and using it in a
>subsequent query to get the necessary resultset from datasource ŒB¹Šbut
>there are well over 1000 records I am trying to join.  Is there a way
>around this so that the query mechanism doesn¹t
> rely on using an IN(<list>) to join these two datasets?
> 
>Thanks so much in advance.  I really like what I see in the new release
>and am looking forward to getting this up
> and running!
> 
>Thanks again!
> 
>Tony

_______________________________________________
Users mailing list
[email protected]
https://lists.biomart.org/mailman/listinfo/users

Re: [BioMart Users] Joining datasets: 0.8 rc6

Reply via email to