Hi Vivek Which version of malhar and malhar-hive are you using? It may help to use the latest version (3.7.0) as a couple of fixes have gone in recently that might fix your issue (APEXMALHAR-2394 <https://github.com/apache/apex-malhar/commit/4587a55c0bc7b178ea6fc13a49db4cd7b1ac1ebb> and APEXMALHAR-2342 <https://github.com/apache/apex-malhar/commit/1df0b523ad522595f6bb30ff76aacb57d239807f> ).
Also with regard to my suggestion about (3) it seems this is by design i.e. operator properties inside a module are not meant to be exposed unless the module writer intends to expose them explicitly (as the module's own properties). Sanjay On Mon, May 22, 2017 at 12:01 PM, Sanjay Pujare <san...@datatorrent.com> wrote: > Hi Vivek > > Looks like you have run into a couple of bugs. You may want to create the > following JIRAs (and look into fixing them?) > > 1) JIRA for your case 1. What can be suggested is to add another trigger > for the roll up event (say when the number of tmp files exceeds a certain > configurable number instead of just the empty window count). > > 2) JIRA for the 2 NPEs: I tried to look into these to determine if these > are bugs or configuration issues but wasn't able to. May be you can open 2 > JIRAs for the 2 different NPEs stack traces. > > 3) Enhancement request: ability to set a property of an operator inside a > module e.g. modulename$operatorname that you mentioned. > > I think what will be quickest is for you to address this and submit it to > malhar for review and commit. > > > Sanjay > > > > On Sat, May 20, 2017 at 8:35 AM, Vivek Bhide <bhide.vi...@gmail.com> > wrote: > >> Hi Sanjay, >> >> After working for some more time i could find a pattern on how and when >> the >> code breaks but for sure in any situation it doesn't work. Below are my >> oservations till now >> >> 1. Regarding setting a parameter, as you said, the app_name is optional >> and >> reason is you don't expect to have more than one streamingapplications in >> your project. I think the app_name will matter in case if there are more >> than one streamingapplications in your .apa with properties in same .xml >> file >> 2. I tried setting the maxWindowsWithNoData to very high value but only >> way >> I could set it up is by using * in place of operator name. The reason is, >> HiveOutputModule doen't accept it as a parameter and instead it is one of >> the operators params from HiveOutputModule i.e >> AbstractFSRollingOutputOperator. At this point, there is no provision for >> setting the parameter which are embeded in a module, even if using >> <modulename$operatorname> pattern, and it is module's responsibility to >> accept it as a level 1 operator from properties file and set it for the >> level 2 operator when it is building the DAG. I could verify this with a >> quick test case for another module that I have built in my project and can >> share the code base for the same >> 3. File rollup depend on 2 params maxWindowsWithNoData (from >> AbstractFSRollingOutputOperator ) and maxLength from HiveModule >> case 1 : maxWindowsWithNoData set to high no and maxlenght = 50MB >> (default >> 128MB) >> Result : In this case, the file rollup doesn't happen until the >> emptywindow >> count reach to this point. I could that there were multiple 50 MB files >> created under <hdfs_dir>/<yarn_app_id>/10/<partition_col> location but >> none >> of the filed rolled up from .tmp to final file even after running the app >> for more than 10 hours >> >> case 2 : maxWindowsWithNoData set to 480 (4 mins) and maxlenght = >> 50MB >> (default 128MB) >> Result : In this case if maxlenght limit reaches first I get below >> exception, nullpointer again but the stack trace is different and if the >> maxWindowsWithNoData reaches first then I get the same null pointer that I >> reported at first place >> >> 2017-05-19 10:02:37,401 INFO stram.StreamingContainerParent >> (StreamingContainerParent.java:log(170)) - child msg: >> [container_e3092_1491920474239_131026_01_000016] Entering heartbeat >> loop.. >> context: >> PTContainer[id=9(container_e3092_1491920474239_131026_01_000 >> 016),state=ALLOCATED,operators=[PTOperator[id=10,name= >> hiveOutput$fsRolling,state=PENDING_DEPLOY]]] >> 2017-05-19 10:02:38,414 INFO >> stram.StreamingContainerManager >> (StreamingContainerManager.java:processHeartbeat(1486)) - Container >> container_e3092_1491920474239_131026_01_000016 buffer server: >> d-d7zvfz1.target.com:45373 >> 2017-05-19 10:02:38,725 INFO >> stram.StreamingContainerParent >> (StreamingContainerParent.java:log(170)) - child msg: Stopped running >> due to >> an exception. java.lang.NullPointerException >> at >> com.datatorrent.lib.io.fs.AbstractFileOutputOperator.request >> Finalize(AbstractFileOutputOperator.java:742) >> at >> com.datatorrent.lib.io.fs.AbstractFileOutputOperator.rotate( >> AbstractFileOutputOperator.java:883) >> at >> com.datatorrent.contrib.hive.AbstractFSRollingOutputOperator >> .rotateCall(AbstractFSRollingOutputOperator.java:186) >> at >> com.datatorrent.contrib.hive.AbstractFSRollingOutputOperator >> .endWindow(AbstractFSRollingOutputOperator.java:227) >> at >> com.datatorrent.stram.engine.GenericNode.processEndWindow(Ge >> nericNode.java:153) >> at com.datatorrent.stram.engine.G >> enericNode.run(GenericNode.java:397) >> at >> com.datatorrent.stram.engine.StreamingContainer$2.run(Stream >> ingContainer.java:1428) >> context: >> PTContainer[id=9(container_e3092_1491920474239_131026_01_000 >> 016),state=ACTIVE,operators=[PTOperator[id=10,name= >> hiveOutput$fsRolling,state=PENDING_DEPLOY]]] >> >> >> In any case the code always fails. I was really excited to have thi >> incorporated but for now, I had kept it aside and sticking to simple HDFS >> sink. Will work on it again to find more as time permits >> >> Let me know your thoughts on this >> >> Regards >> Vivek >> >> >> >> -- >> View this message in context: http://apache-apex-users-list. >> 78494.x6.nabble.com/NullPointerException-at-AbstractFSRollin >> gOutputOperator-while-using-HiveOutputModule-tp1625p1639.html >> Sent from the Apache Apex Users list mailing list archive at Nabble.com. >> > >