Hi Luca, This approach causes each document's binary data to be read more than once. I think that is expensive, especially if there are a lot of values. for a row.
Instead I think something more like ACLs will be needed -- that is, a separate query for each multi-valued field. This is more work but it would work much better. I will create a ticket to add this to the JDBC connector, but it won't happen for a while. Karl On Fri, May 6, 2016 at 9:40 AM, Luca Alicata <[email protected]> wrote: > I've decompile java connector and modified the code in this way: > > in process document, i see that just currently arrive all row of query > result (also multi values row), but in the cycle that parse document, after > first document with an ID, all the other with the same are skipped. > So i removed the control that not permits to check other document with the > same ID and i modified the method that store metadata, to permit to store > multi value data as array in metadata mapping. > > I attached the code in this e-mail. You can find a comment that start with > "---", that i insert know for you. > > Thanks, > L. Alicata > > 2016-05-06 15:25 GMT+02:00 Karl Wright <[email protected]>: > >> Ok, it's now clear what you are looking for, but it is still not clear >> how we'd integrate that in the JDBC connector. How did you do this when >> you modified the connector for 1.8? >> >> Karl >> >> >> On Fri, May 6, 2016 at 9:21 AM, Luca Alicata <[email protected]> >> wrote: >> >>> Hi Karl, >>> sorry for my english :). >>> I mean the fact that i've to extract value from query with a join >>> between two table with a relationship of one-to-many, the dataset returned >>> from Connector is only one pair from the two table. >>> >>> For example: >>> Table A with persons >>> Table B with eyes >>> >>> As result of join, i aspect have two row like: >>> person 1, eye left >>> person 1, eye right >>> >>> but the connector returns only one row: >>> person 1, eye left >>> >>> I hope now it's more clear. >>> >>> Ps. i report the phrase on Manifold documentation that explain that ( >>> https://manifoldcf.apache.org/release/release-2.3/en_US/end-user-documentation.html#jdbcrepository >>> ): >>> ------ >>> There is currently no support in the JDBC connection type for natively >>> handling multi-valued metadata. >>> ------ >>> >>> Thanks, >>> L. Alicata >>> >>> >>> 2016-05-06 15:10 GMT+02:00 Karl Wright <[email protected]>: >>> >>>> Hi Luca, >>>> >>>> It is not clear what you mean by "multi value extraction" using the >>>> JDBC connector. The JDBC connector allows collection of primary binary >>>> content as well as metadata from a database row. So maybe if you can >>>> explain what you need beyond that it would help. >>>> >>>> Thanks, >>>> Karl >>>> >>>> >>>> On Fri, May 6, 2016 at 9:04 AM, Luca Alicata <[email protected]> >>>> wrote: >>>> >>>>> Hi Karl, >>>>> thanks for information, fortunately in other jboss instance i have a >>>>> old Manifold configuration with single process, that i've dismissed. But >>>>> in >>>>> this moment, i start to test this jobs with that and if it work fine, i >>>>> can >>>>> use it only for this job and use it also in production. Maybe after, if i >>>>> can, i try to check the possible problem that stop the agent. >>>>> >>>>> I Take advantage of this discussion to ask you, if multi-value >>>>> extraction from db is consider as possible future work or no. Because i've >>>>> used this generi connector to resolve this lack of JDBC Connector. In fact >>>>> with Manifold 1.8 i've modified the connector to support this behavior (in >>>>> addiction to parse blob file), but upgrade Manifold Version, to not >>>>> rewrite >>>>> the new connector i decide to use Generic Connector with application that >>>>> do the work of extraction data from DB. >>>>> >>>>> Thanks, >>>>> L. Alicata >>>>> >>>>> 2016-05-06 14:42 GMT+02:00 Karl Wright <[email protected]>: >>>>> >>>>>> Hi Luca, >>>>>> >>>>>> If you do a lock clean and the process still stops, then the locks >>>>>> are not the problem. >>>>>> >>>>>> One way we can drill down into the problem is to get a thread dump of >>>>>> the agents process after it stops. The thread dump must be of the agents >>>>>> process, not any of the others. >>>>>> >>>>>> FWIW, the generic connector is not well supported; the person who >>>>>> wrote it is still a committer but is not actively involved in MCF >>>>>> development at this time. I suspect that the problem may have to do with >>>>>> how that connector deals with exceptions or errors, but I am not sure. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Karl >>>>>> >>>>>> >>>>>> On Fri, May 6, 2016 at 8:38 AM, Luca Alicata <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Karl, >>>>>>> I've just tried with lock-clean after agents stop to work, obviously >>>>>>> after stopping process. After this, job start correctly, but just second >>>>>>> time that i start a job with a lot of data (or sometimes the third >>>>>>> time), >>>>>>> agent stop again. >>>>>>> >>>>>>> Unfortunately, it's difficult start, for the moment, to using >>>>>>> Zookeeper in this environment, but this can resolve the fact that during >>>>>>> working agents stop to work? or help only for cleaning lock agent when i >>>>>>> restart the process? >>>>>>> >>>>>>> Thanks, >>>>>>> L. Alicata >>>>>>> >>>>>>> 2016-05-06 14:15 GMT+02:00 Karl Wright <[email protected]>: >>>>>>> >>>>>>>> Hi Luca, >>>>>>>> >>>>>>>> With file-based synchronization, if you kill any of the processes >>>>>>>> involved, you will need to execute the lock-clean procedure to make >>>>>>>> sure >>>>>>>> you have no dangling locks in the file system. >>>>>>>> >>>>>>>> - shut down all MCF processes (except the database) >>>>>>>> - run the lock-clean script >>>>>>>> - start your MCF processes back up >>>>>>>> >>>>>>>> I suspect what you are seeing is related to this. >>>>>>>> >>>>>>>> Also, please consider using Zookeeper instead, since it is more >>>>>>>> robust about cleaning out dangling locks. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Karl >>>>>>>> >>>>>>>> >>>>>>>> On Fri, May 6, 2016 at 8:06 AM, Luca Alicata <[email protected] >>>>>>>> > wrote: >>>>>>>> >>>>>>>>> Hi Karl, >>>>>>>>> thanks for help. >>>>>>>>> In my case i've only one instance of MCF running, with both type >>>>>>>>> of job (SP and Generic), and so i have only one properties files >>>>>>>>> (that i >>>>>>>>> have attached). >>>>>>>>> For information i used (multiprocess-file configuration) with >>>>>>>>> postgres. >>>>>>>>> >>>>>>>>> Do you have other suggestions? do you need more information, that >>>>>>>>> i can give you? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> L.Alicata >>>>>>>>> >>>>>>>>> 2016-05-06 12:55 GMT+02:00 Karl Wright <[email protected]>: >>>>>>>>> >>>>>>>>>> Hi Luca, >>>>>>>>>> >>>>>>>>>> Do you have multiple independent MCF clusters running at the same >>>>>>>>>> time? It sounds like you do: you have SP on one, and Generic on >>>>>>>>>> another. >>>>>>>>>> If so, you will need to be sure that the synchronization you are >>>>>>>>>> using >>>>>>>>>> (either zookeeper or file-based) does not overlap. Each cluster >>>>>>>>>> needs its >>>>>>>>>> own synchronization. If there is overlap, then doing things with one >>>>>>>>>> cluster may cause the other cluster to hang. This also means you >>>>>>>>>> have to >>>>>>>>>> have different properties files for the two clusters, of course. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Karl >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, May 6, 2016 at 4:32 AM, Luca Alicata < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> i'm using Manifold 2.2 with multi-process configuration in Jboss >>>>>>>>>>> instance inside a Windows Server 2012 and i've a set of job that >>>>>>>>>>> work with >>>>>>>>>>> Sharepoint (SP) or Generic Connector (GC), that get file from a db. >>>>>>>>>>> With SP i've no problem, while with GC with a lot of document >>>>>>>>>>> (one with 47k and another with 60k), the Seed taking process, >>>>>>>>>>> sometimes, >>>>>>>>>>> not finish, because the agents seem to stop (although java process >>>>>>>>>>> is still >>>>>>>>>>> alive). >>>>>>>>>>> After this, if i try to start any other job, that not start, >>>>>>>>>>> like the agents are stopped. >>>>>>>>>>> >>>>>>>>>>> Other times, this jobs work correctly and one time together work >>>>>>>>>>> correctly, running in the same moment. >>>>>>>>>>> >>>>>>>>>>> For information: >>>>>>>>>>> >>>>>>>>>>> - On Jboss there are only Manifold and Generic Repository >>>>>>>>>>> application. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> - On the same Virtual Server, there is another Jboss >>>>>>>>>>> istance, with solr istance and a web application. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> - I've check if it was a type of memory problem, but it's >>>>>>>>>>> not the case. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> - GC with almost 23k seed work always, at least in test that >>>>>>>>>>> i've done. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> - In local instance of Jboss with Manifold and Generic >>>>>>>>>>> Rpository Application, i've not keep this problem. >>>>>>>>>>> >>>>>>>>>>> This is the only recurrent information that i've seen on >>>>>>>>>>> manifold.log: >>>>>>>>>>> --------------- >>>>>>>>>>> Connection 0.0.0.0:62755<-><ip-address>:<port> shut down >>>>>>>>>>> Releasing connection >>>>>>>>>>> org.apache.http.impl.conn.ManagedClientConnectionImpl@6c98c1bd >>>>>>>>>>> >>>>>>>>>>> --------------- >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> L. Alicata >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
