Hi Vivek Looks like you have run into a couple of bugs. You may want to create the following JIRAs (and look into fixing them?)
1) JIRA for your case 1. What can be suggested is to add another trigger for the roll up event (say when the number of tmp files exceeds a certain configurable number instead of just the empty window count). 2) JIRA for the 2 NPEs: I tried to look into these to determine if these are bugs or configuration issues but wasn't able to. May be you can open 2 JIRAs for the 2 different NPEs stack traces. 3) Enhancement request: ability to set a property of an operator inside a module e.g. modulename$operatorname that you mentioned. I think what will be quickest is for you to address this and submit it to malhar for review and commit. Sanjay On Sat, May 20, 2017 at 8:35 AM, Vivek Bhide <bhide.vi...@gmail.com> wrote: > Hi Sanjay, > > After working for some more time i could find a pattern on how and when the > code breaks but for sure in any situation it doesn't work. Below are my > oservations till now > > 1. Regarding setting a parameter, as you said, the app_name is optional and > reason is you don't expect to have more than one streamingapplications in > your project. I think the app_name will matter in case if there are more > than one streamingapplications in your .apa with properties in same .xml > file > 2. I tried setting the maxWindowsWithNoData to very high value but only way > I could set it up is by using * in place of operator name. The reason is, > HiveOutputModule doen't accept it as a parameter and instead it is one of > the operators params from HiveOutputModule i.e > AbstractFSRollingOutputOperator. At this point, there is no provision for > setting the parameter which are embeded in a module, even if using > <modulename$operatorname> pattern, and it is module's responsibility to > accept it as a level 1 operator from properties file and set it for the > level 2 operator when it is building the DAG. I could verify this with a > quick test case for another module that I have built in my project and can > share the code base for the same > 3. File rollup depend on 2 params maxWindowsWithNoData (from > AbstractFSRollingOutputOperator ) and maxLength from HiveModule > case 1 : maxWindowsWithNoData set to high no and maxlenght = 50MB > (default > 128MB) > Result : In this case, the file rollup doesn't happen until the > emptywindow > count reach to this point. I could that there were multiple 50 MB files > created under <hdfs_dir>/<yarn_app_id>/10/<partition_col> location but > none > of the filed rolled up from .tmp to final file even after running the app > for more than 10 hours > > case 2 : maxWindowsWithNoData set to 480 (4 mins) and maxlenght = > 50MB > (default 128MB) > Result : In this case if maxlenght limit reaches first I get below > exception, nullpointer again but the stack trace is different and if the > maxWindowsWithNoData reaches first then I get the same null pointer that I > reported at first place > > 2017-05-19 10:02:37,401 INFO stram.StreamingContainerParent > (StreamingContainerParent.java:log(170)) - child msg: > [container_e3092_1491920474239_131026_01_000016] Entering heartbeat loop.. > context: > PTContainer[id=9(container_e3092_1491920474239_131026_01_ > 000016),state=ALLOCATED,operators=[PTOperator[id=10, > name=hiveOutput$fsRolling,state=PENDING_DEPLOY]]] > 2017-05-19 10:02:38,414 INFO stram. > StreamingContainerManager > (StreamingContainerManager.java:processHeartbeat(1486)) - Container > container_e3092_1491920474239_131026_01_000016 buffer server: > d-d7zvfz1.target.com:45373 > 2017-05-19 10:02:38,725 INFO > stram.StreamingContainerParent > (StreamingContainerParent.java:log(170)) - child msg: Stopped running due > to > an exception. java.lang.NullPointerException > at > com.datatorrent.lib.io.fs.AbstractFileOutputOperator.requestFinalize( > AbstractFileOutputOperator.java:742) > at > com.datatorrent.lib.io.fs.AbstractFileOutputOperator.rotate( > AbstractFileOutputOperator.java:883) > at > com.datatorrent.contrib.hive.AbstractFSRollingOutputOperator.rotateCall( > AbstractFSRollingOutputOperator.java:186) > at > com.datatorrent.contrib.hive.AbstractFSRollingOutputOperator.endWindow( > AbstractFSRollingOutputOperator.java:227) > at > com.datatorrent.stram.engine.GenericNode.processEndWindow( > GenericNode.java:153) > at com.datatorrent.stram.engine. > GenericNode.run(GenericNode.java:397) > at > com.datatorrent.stram.engine.StreamingContainer$2.run( > StreamingContainer.java:1428) > context: > PTContainer[id=9(container_e3092_1491920474239_131026_01_ > 000016),state=ACTIVE,operators=[PTOperator[id=10, > name=hiveOutput$fsRolling,state=PENDING_DEPLOY]]] > > > In any case the code always fails. I was really excited to have thi > incorporated but for now, I had kept it aside and sticking to simple HDFS > sink. Will work on it again to find more as time permits > > Let me know your thoughts on this > > Regards > Vivek > > > > -- > View this message in context: http://apache-apex-users-list. > 78494.x6.nabble.com/NullPointerException-at-AbstractFSRollingOutputOperato > r-while-using-HiveOutputModule-tp1625p1639.html > Sent from the Apache Apex Users list mailing list archive at Nabble.com. >